NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

How to Practice VQA on a Resource-limited Target Domain

https://doi.org/10.1109/wacv56688.2023.00443

Zhang, Mingda; Hwa, Rebecca; Kovashka, Adriana (January 2023, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))

Full Text Available
Learning to Overcome Noise in Weak Caption Supervision for Object Detection

https://doi.org/10.1109/TPAMI.2022.3187350

Unal, Mesut Erhan; Ye, Keren; Zhang, Mingda; Thomas, Christopher; Kovashka, Adriana; Li, Wei; Qin, Danfeng; Berent, Jesse (June 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence)

Full Text Available
Domain-robust VQA with diverse datasets and methods but no target labels

Zhang, Mingda; Maidment, Tristan; Diab, Ahmad; Kovashka, Adriana; Hwa, Rebecca (June 2021, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
null (Ed.)
Full Text Available
Breaking Shortcuts by Masking for Robust Visual Reasoning

https://doi.org/10.1109/WACV48630.2021.00356

Ye, Keren; Zhang, Mingda; Kovashka, Adriana (January 2021, Proceedings of the Winter Conference on Applications of Computer Vision (WACV))
null (Ed.)
Full Text Available
Story Completion with Explicit Modeling of Commonsense Knowledge

https://doi.org/10.1109/CVPRW50498.2020.00196

Zhang, Mingda; Ye, Keren; Hwa, Rebecca; Kovashka, Adriana (June 2020, The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020)

Growing up with bedtime tales, even children could easily tell how a story should develop; but selecting a coherent and reasonable ending for a story is still not easy for machines. To successfully choose an ending requires not only detailed analysis of the context, but also applying commonsense reasoning and basic knowledge. Previous work has shown that language models trained on very large corpora could capture common sense in an implicit and hard-to-interpret way. We explore another direction and present a novel method that explicitly incorporates commonsense knowledge from a structured dataset, and demonstrate the potential for improving story completion.
more » « less
Full Text Available
An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion

https://doi.org/10.3390/rs11242898

Tan, Zhenyu; Di, Liping; Zhang, Mingda; Guo, Liying; Gao, Meiling (December 2019, Remote Sensing)

Earth observation data with high spatiotemporal resolution are critical for dynamic monitoring and prediction in geoscience applications, however, due to some technique and budget limitations, it is not easy to acquire satellite images with both high spatial and high temporal resolutions. Spatiotemporal image fusion techniques provide a feasible and economical solution for generating dense-time data with high spatial resolution, pushing the limits of current satellite observation systems. Among existing various fusion algorithms, deeplearningbased models reveal a promising prospect with higher accuracy and robustness. This paper refined and improved the existing deep convolutional spatiotemporal fusion network (DCSTFN) to further boost model prediction accuracy and enhance image quality. The contributions of this paper are twofold. First, the fusion result is improved considerably with brand-new network architecture and a novel compound loss function. Experiments conducted in two different areas demonstrate these improvements by comparing them with existing algorithms. The enhanced DCSTFN model shows superior performance with higher accuracy, vision quality, and robustness. Second, the advantages and disadvantages of existing deeplearningbased spatiotemporal fusion models are comparatively discussed and a network design guide for spatiotemporal fusion is provided as a reference for future research. Those comparisons and guidelines are summarized based on numbers of actual experiments and have promising potentials to be applied for other image sources with customized spatiotemporal fusion networks.
more » « less
Full Text Available
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text

Zhang, Mingda; Hwa, Rebecca; Kovashka, Adriana (September 2018, British Machine Vision Conference (BMVC))

Images and text in advertisements interact in complex, non-literal ways. The two channels are usually complementary, with each channel telling a different part of the story. Current approaches, such as image captioning methods, only examine literal, redundant relationships, where image and text show exactly the same content. To understand more complex relationships, we first collect a dataset of advertisement interpretations for whether the image and slogan in the same visual advertisement form a parallel (conveying the same message without literally saying the same thing) or non-parallel relationship, with the help of workers recruited on Amazon Mechanical Turk. We develop a variety of features that capture the creativity of images and the specificity or ambiguity of text, as well as methods that analyze the semantics within and across channels. We show that our method outperforms standard image-text alignment approaches on predicting the parallel/non-parallel relationship between image and text.
more » « less
Full Text Available

Search for: All records